**Job Script Examples** -------------------------- **Basic examples** ===================== **General blueprint for a jobscript** You can save the following example to a file (e.g. testrun.sh) on FEDGEN HPC. Comment the two cp commands that are just for illustratory purpose (lines 46 and 55) and change the SBATCH directives where applicable. You can then run the script by typing: .. code-block:: python $ sbatch testrun.sh Please note that all values that you define with SBATCH directives are hard values. When you, for example, ask for 4000 MB of memory (--mem=4000MB) and your job uses more than that, the job will be automatically killed by the manager. .. code-block:: python #!/bin/bash -l ############################## # Job blueprint # ############################## # Give your job a name, so you can recognize it in the queue overview #SBATCH --job-name=example # Define, how many nodes you need. Here, we ask for 1 node. # Each node has 16 or 20 CPU cores. #SBATCH --nodes=1 # You can further define the number of tasks with --ntasks-per-* # See "man sbatch" for details. e.g. --ntasks=4 will ask for 4 cpus. # Define, how long the job will run in real time. This is a hard cap meaning # that if the job runs longer than what is written here, it will be # force-stopped by the server. If you make the expected time too long, it will # take longer for the job to start. Here, we say the job will take 5 minutes. # d-hh:mm:ss #SBATCH --time=0-00:05:00 # Define the partition on which the job shall run. May be omitted. #SBATCH --partition debug # How much memory you need. # --mem will define memory per node and # --mem-per-cpu will define memory per CPU/core. Choose one of those. #SBATCH --mem-per-cpu=1500MB ##SBATCH --mem=5GB # this one is not in effect, due to the double hash # Turn on mail notification. There are many possible self-explaining values: # NONE, BEGIN, END, FAIL, ALL (including all aforementioned) # For more values, check "man sbatch" #SBATCH --mail-type=END,FAIL # You may not place any commands before the last SBATCH directive # Define and create a unique scratch directory for this job SCRATCH_DIRECTORY=/fedgenscratch/work/${USER}/${SLURM_JOBID}.allot.hpc.fedgen.net mkdir -p ${SCRATCH_DIRECTORY} cd ${SCRATCH_DIRECTORY} # You can copy everything you need to the scratch directory # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from cp ${SLURM_SUBMIT_DIR}/myfiles*.txt ${SCRATCH_DIRECTORY} # This is where the actual work is done. In this case, the script only waits. # The time command is optional, but it may give you a hint on how long the # command worked time sleep 10 #sleep 10 # After the job is done we copy our output back to $SLURM_SUBMIT_DIR cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR} # In addition to the copied files, you will also find a file called # slurm-1234.out in the submit directory. This file will contain all output that # was produced during runtime, i.e. stdout and stderr. # After everything is saved to the home directory, delete the work directory to # save space on /fedgenscratch/work/ cd ${SLURM_SUBMIT_DIR} rm -rf ${SCRATCH_DIRECTORY} # Finish the script exit 0 **Running many sequential jobs in parallel using job arrays** ============================================================== In this example we wish to run many similar sequential jobs in parallel using job arrays. We take Python as an example but this does not matter for the job arrays: .. code-block:: python #!/usr/bin/env python import time print('start at ' + time.strftime('%H:%M:%S')) print('sleep for 10 seconds ...') time.sleep(10) print('stop at ' + time.strftime('%H:%M:%S')) Save this to a file called “test.py” and try it out: .. code-block:: python $ python test.py start at 15:23:48 sleep for 10 seconds ... stop at 15:23:58 Good. Now we would like to run this script 16 times at the same time. For this we use the following script: .. code-block:: python #!/bin/bash -l ##################### # job-array example # ##################### #SBATCH --job-name=example # 16 jobs will run in this array at the same time #SBATCH --array=1-16 # run for five minutes # d-hh:mm:ss #SBATCH --time=0-00:05:00 # 500MB memory per core # this is a hard limit #SBATCH --mem-per-cpu=500MB # you may not place bash commands before the last SBATCH directive # define and create a unique scratch directory SCRATCH_DIRECTORY=/fedgenscratch/work/${USER}/job-array-example/${SLURM_JOBID} mkdir -p ${SCRATCH_DIRECTORY} cd ${SCRATCH_DIRECTORY} cp ${SLURM_SUBMIT_DIR}/test.py ${SCRATCH_DIRECTORY} # each job will see a different ${SLURM_ARRAY_TASK_ID} echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID} python test.py > output_${SLURM_ARRAY_TASK_ID}.txt # after the job is done we copy our output back to $SLURM_SUBMIT_DIR cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR} # we step out of the scratch directory and remove it cd ${SLURM_SUBMIT_DIR} rm -rf ${SCRATCH_DIRECTORY} # happy end exit 0 Submit the script and after a short while you should see 16 output files in your submit directory: .. code-block:: python $ ls -l output*.txt -rw------- 1 user user 60 Oct 14 14:44 output_1.txt -rw------- 1 user user 60 Oct 14 14:44 output_10.txt -rw------- 1 user user 60 Oct 14 14:44 output_11.txt -rw------- 1 user user 60 Oct 14 14:44 output_12.txt -rw------- 1 user user 60 Oct 14 14:44 output_13.txt -rw------- 1 user user 60 Oct 14 14:44 output_14.txt -rw------- 1 user user 60 Oct 14 14:44 output_15.txt -rw------- 1 user user 60 Oct 14 14:44 output_16.txt -rw------- 1 user user 60 Oct 14 14:44 output_2.txt -rw------- 1 user user 60 Oct 14 14:44 output_3.txt -rw------- 1 user user 60 Oct 14 14:44 output_4.txt -rw------- 1 user user 60 Oct 14 14:44 output_5.txt -rw------- 1 user user 60 Oct 14 14:44 output_6.txt -rw------- 1 user user 60 Oct 14 14:44 output_7.txt -rw------- 1 user user 60 Oct 14 14:44 output_8.txt -rw------- 1 user user 60 Oct 14 14:44 output_9.txt **Packaging smaller parallel jobs into one large parallel job** ================================================================= There are several ways to package smaller parallel jobs into one large parallel job. The preferred way is to use Job Arrays. Browse the web for many examples on how to do it. Here we want to present a more pedestrian alternative which can give a lot of flexibility. In this example we imagine that we wish to run 5 MPI jobs at the same time, each using 4 tasks, thus totalling to 20 tasks. Once they finish, we wish to do a post-processing step and then resubmit another set of 5 jobs with 4 tasks each: .. code-block:: python #!/bin/bash #SBATCH --job-name=example #SBATCH --ntasks=20 #SBATCH --time=0-00:05:00 #SBATCH --mem-per-cpu=500MB cd ${SLURM_SUBMIT_DIR} # first set of parallel runs mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & wait # here a post-processing step # ... # another set of parallel runs mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & mpirun -n 4 ./my-binary & wait exit 0 The wait commands are important here - the run script will only continue once all commands started with & have completed. **Example on how to allocate entire memory on one node** ========================================================== .. code-block:: python #!/bin/bash -l ################################################### # Example for a job that consumes a lot of memory # ################################################### #SBATCH --job-name=example # we ask for 1 node #SBATCH --nodes=1 # run for five minutes # d-hh:mm:ss #SBATCH --time=0-00:05:00 # total memory for this job # this is a hard limit # note that if you ask for more than one CPU has, your account gets # charged for the other (idle) CPUs as well #SBATCH --mem=31000MB # turn on all mail notification #SBATCH --mail-type=ALL # you may not place bash commands before the last SBATCH directive # define and create a unique scratch directory SCRATCH_DIRECTORY=/fedgenscratch/work/${USER}/example/${SLURM_JOBID} mkdir -p ${SCRATCH_DIRECTORY} cd ${SCRATCH_DIRECTORY} # we copy everything we need to the scratch directory # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY} # we execute the job and time it time ./my_binary.x > my_output # after the job is done we copy our output back to $SLURM_SUBMIT_DIR cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR} # we step out of the scratch directory and remove it cd ${SLURM_SUBMIT_DIR} rm -rf ${SCRATCH_DIRECTORY} # happy end exit 0 **How to recover files before a job times out** ================================================== Possibly you would like to clean up the work directory or recover files for restart in case a job times out. In this example we ask Slurm to send a signal to our script 120 seconds before it times out to give us a chance to perform clean-up actions. .. code-block:: python #!/bin/bash -l # job name #SBATCH --job-name=example # replace this by your account #SBATCH --account=... # one core only #SBATCH --ntasks=1 # we give this job 4 minutes #SBATCH --time=0-00:04:00 # asks SLURM to send the USR1 signal 120 seconds before end of the time limit #SBATCH --signal=B:USR1@120 # define the handler function # note that this is not executed here, but rather # when the associated signal is sent your_cleanup_function() { echo "function your_cleanup_function called at $(date)" # do whatever cleanup you want here } # call your_cleanup_function once we receive USR1 signal trap 'your_cleanup_function' USR1 echo "starting calculation at $(date)" # the calculation "computes" (in this case sleeps) for 1000 seconds # but we asked slurm only for 240 seconds so it will not finish # the "&" after the compute step and "wait" are important sleep 1000 & wait **OpenMP and MPI** ======================= You can download the examples given here to a file (e.g. smpijob.sh) and start it with: .. code-block:: python $ sbatch mpijob.sh **Example for an OpenMP job** .. code-block:: python #!/bin/bash -l ############################# # example for an OpenMP job # ############################# #SBATCH --job-name=example # we ask for 1 task with 20 cores #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=20 # exclusive makes all memory available #SBATCH --exclusive # run for five minutes # d-hh:mm:ss #SBATCH --time=0-00:05:00 # turn on all mail notification #SBATCH --mail-type=ALL # you may not place bash commands before the last SBATCH directive # define and create a unique scratch directory SCRATCH_DIRECTORY=/fedgenscratch/work/${USER}/example/${SLURM_JOBID} mkdir -p ${SCRATCH_DIRECTORY} cd ${SCRATCH_DIRECTORY} # we copy everything we need to the scratch directory # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY} # we set OMP_NUM_THREADS to the number of available cores export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} # we execute the job and time it time ./my_binary.x > my_output # after the job is done we copy our output back to $SLURM_SUBMIT_DIR cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR} # we step out of the scratch directory and remove it cd ${SLURM_SUBMIT_DIR} rm -rf ${SCRATCH_DIRECTORY} # happy end exit 0 **Example for an MPI job** .. code-block:: python #!/bin/bash -l ########################## # example for an MPI job # ########################## #SBATCH --job-name=example # 80 MPI tasks in total # FEDGEN HPC has 16 or 20 cores/node and therefore we take # a number that is divisible by both #SBATCH --ntasks=80 # run for five minutes # d-hh:mm:ss #SBATCH --time=0-00:05:00 # 500MB memory per core # this is a hard limit #SBATCH --mem-per-cpu=500MB # turn on all mail notification #SBATCH --mail-type=ALL # you may not place bash commands before the last SBATCH directive # define and create a unique scratch directory SCRATCH_DIRECTORY=/fedgenscratch/work/${USER}/example/${SLURM_JOBID} mkdir -p ${SCRATCH_DIRECTORY} cd ${SCRATCH_DIRECTORY} # we copy everything we need to the scratch directory # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY} # we execute the job and time it time mpirun -np $SLURM_NTASKS ./my_binary.x > my_output # after the job is done we copy our output back to $SLURM_SUBMIT_DIR cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR} # we step out of the scratch directory and remove it cd ${SLURM_SUBMIT_DIR} rm -rf ${SCRATCH_DIRECTORY} # happy end exit 0 **Example for a hybrid MPI/OpenMP job** .. code-block:: python #!/bin/bash -l ####################################### # example for a hybrid MPI OpenMP job # ####################################### #SBATCH --job-name=example # we ask for 4 MPI tasks with 10 cores each #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH --cpus-per-task=10 # run for five minutes # d-hh:mm:ss #SBATCH --time=0-00:05:00 # 500MB memory per core # this is a hard limit #SBATCH --mem-per-cpu=500MB # turn on all mail notification #SBATCH --mail-type=ALL # you may not place bash commands before the last SBATCH directive # define and create a unique scratch directory SCRATCH_DIRECTORY=/fedgenscratch/work/${USER}/example/${SLURM_JOBID} mkdir -p ${SCRATCH_DIRECTORY} cd ${SCRATCH_DIRECTORY} # we copy everything we need to the scratch directory # ${SLURM_SUBMIT_DIR} points to the path where this script was submitted from cp ${SLURM_SUBMIT_DIR}/my_binary.x ${SCRATCH_DIRECTORY} # we set OMP_NUM_THREADS to the number cpu cores per MPI task export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} # we execute the job and time it time mpirun -np $SLURM_NTASKS ./my_binary.x > my_output # after the job is done we copy our output back to $SLURM_SUBMIT_DIR cp ${SCRATCH_DIRECTORY}/my_output ${SLURM_SUBMIT_DIR} # we step out of the scratch directory and remove it cd ${SLURM_SUBMIT_DIR} rm -rf ${SCRATCH_DIRECTORY} # happy end exit 0 If you want to start more than one MPI rank per node you can use --ntasks-per-node in combination with --nodes: .. code-block:: python *#SBATCH --nodes=4 --ntasks-per-node=2 --cpus-per-task=8* This will start 2 MPI tasks each on 4 nodes, where each task can use up to 8 threads **Message passing example (MPI)** =================================== .. code-block:: python #!/bin/bash # #SBATCH --job-name=test_mpi #SBATCH --output=res_mpi.txt # #SBATCH --ntasks=4 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 module load OpenMPI srun hello.mpi Request four cores on the cluster for 10 minutes, using 100 MB of RAM per core. Assuming hello.mpi was compiled with MPI support, srun will create four instances of it, on the nodes allocated by Slurm. You can try the above example by downloading the example `hello world program from Wikipedia `__ (name it for instance wiki_mpi_example.c), and compiling it with .. code-block:: python module load OpenMPI mpicc wiki_mpi_example.c -o hello.mpi The res_mpi.txt file should contain something like .. code-block:: python We have 4 processors Hello 1! Processor 1 reporting for duty Hello 2! Processor 2 reporting for duty Hello 3! Processor 3 reporting for duty **Shared memory example (OpenMP)** .. code-block:: python #!/bin/bash # #SBATCH --job-name=test_omp #SBATCH --output=res_omp.txt # #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun ./hello.omp The job will be run in an allocation where four cores have been reserved on the same compute node. You can try it by using the `hello world program from Wikipedia `__ (name it for instance wiki_omp_example.c) and compiling it with .. code-block:: python gcc -fopenmp wiki_omp_example.c -o hello.omp The res_omp.txt file should contain something like .. code-block:: python Hello World from thread 0 Hello World from thread 3 Hello World from thread 1 Hello World from thread 2 There are 4 threads **Embarrassingly parallel workload example (job array)** This setup is useful for problems based on **random draws** (e.g. Monte-Carlo simulations). In such cases, you can have four programs drawing 1000 random samples and combining their output afterwards (with another program) you get the equivalent of drawing 4000 samples. Another typical use of this setting is **parameter sweep**. In this case the same computation is carried on several times by a given code, differing only in the initial value of some high-level parameter for each run. An example could be the optimisation of an integer-valued parameter through range scanning in a **job array**: .. code-block:: python #!/bin/bash # #SBATCH --job-name=test_emb_arr #SBATCH --output=res_emb_arr.txt # #SBATCH --ntasks=1 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 # #SBATCH --array=1-8 srun ./my_program.exe $SLURM_ARRAY_TASK_ID In that configuration, the command my_program.exe will be run eight times, creating eight distinct jobs, each time with a different argument passed with the environment variable defined by slurm **SLURM_ARRAY_TASK_ID** ranging from 1 to 8, as specified by the --array parameter. The same idea can be used to process **several data files**. To different instances of the program we must pass a different file to read, based upon the value set in the $SLURM\_\* environment variable. For instance, assuming there are exactly eight files in /path/to/data we can create the following script: .. code-block:: python #!/bin/bash # #SBATCH --job-name=test_emb_arr #SBATCH --output=res_emb_arr.txt # #SBATCH --ntasks=1 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 # #SBATCH --array=0-7 FILES=(/path/to/data/*) srun ./my_program.exe ${FILES[$SLURM_ARRAY_TASK_ID]} In this case, eight jobs will be submitted, each with a different filename given as an argument to my_program.exe defined in the array FILES[]. As the FILES[] Bash array is zero-indexed, the Slurm job array IDs must also start at 0 so the argument is --array=0-7. One pain point is that the number of files in the directory must match the number of jobs in the array. Note that the same recipe can be used with a numerical argument that is not simply an integer sequence, by defining a Bash array ARGS[] containing the desired values: .. code-block:: python ARGS=(0.05 0.25 0.5 1 2 5 100) srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]} Here again, the Slurm job array numbering must start at 0 to make sure all items in the ARGS[] Bash array are processed. **Warning** If the running time of your program is small, say ten minutes or less, creating a job array will incur a lot of overhead and you should consider *packing* your jobs. **Packed jobs example** ========================== By default, the srun command in a submission script inherits all non-GRES resource allocated in the job, but with the --exact parameter, you can split the resource and allocate them to multiple steps in parallel. As an example, the following job submission script will ask Slurm for 8 CPUs, then it will run the myprog program 1000 times with arguments passed from 1 to 1000. But with the -N1 -n1 -c1 --exact option, it will control that at any point in time only 8 instances are effectively running, each being allocated one CPU. You can at this point decide to allocate several CPUs or tasks by adapting the corresponding parameters. .. code-block:: python #! /bin/bash # #SBATCH --ntasks=8 for i in {1..1000} do srun -N1 -n1 -c1 --exact ./myprog $i & done wait The for-loop can be replaced with GNU parallel if installed on your system: .. code-block:: python parallel -P $SLURM_NTASKS srun -N1 -c1 -n1 --exact ./myprog ::: {1..1000} Similarly, many files can be processed with one job submission script. The following script will run myprog for every file in /path/to/data, but maximum 8 at a time, and using one CPU per task. .. code-block:: python #! /bin/bash # #SBATCH --ntasks=8 for file in /path/to/data/* do srun -N1 -n1 -c1 --exact ./myprog $file & done wait Here again the for-loop can be replaced with another command, xargs: .. code-block:: python find /path/to/data -print0 \| xargs -0 -n1 -P $SLURM_NTASKS srun -n1 --exclusive ./myprog **Master/worker program example** .. code-block:: python #!/bin/bash # #SBATCH --job-name=test_ms #SBATCH --output=res_ms.txt # #SBATCH --ntasks=4 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 srun --multi-prog multi.conf With file multi.conf being, for example, as follows .. code-block:: python 0 echo I am the Master 1-3 echo I am worker %t The above instructs Slurm to create four tasks (or processes), one running echo 'I am the Master', and the other 3 running echo I am worker %t. The %t placeholder will be replaced with the task id. This is typically used in a **producer/consumer** setup where one program (the master) create computing tasks for the other program (the workers) to perform. Upon completion of the above job, file res_ms.txt will contain .. code-block:: python I am worker 2 I am worker 3 I am worker 1 I am the Master though not necessarily in the same order. **Hybrid jobs** ====================== You can mix multi-processing (MPI) and multi-threading (OpenMP) in the same job, simply like this: .. code-block:: python #! /bin/bash # #SBATCH --ntasks=8 #SBATCH --cpus-per-task=4 module load OpenMPI export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun ./myprog or even a job array of hybrid jobs: #! /bin/bash # #SBATCH --array=1-10 #SBATCH --ntasks=8 #SBATCH --cpus-per-task=4 module load OpenMPI export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun ./myprog $SLURM_ARRAY_TASK_ID **GPU jobs** ======================= If you want to claim a GPU for your job, you need to specify the GRES `Generic Resource Scheduling `__ parameter in your job script. Please note that GPUs are only available in a specific partition whose name depends on the cluster. .. code-block:: python #SBATCH --partition=PostP #SBATCH --gres=gpu:1 A sample job file requesting a node with a GPU could look like this: .. code-block:: python #!/bin/bash #SBATCH --job-name=example #SBATCH --ntasks=1 #SBATCH --time=1:00:00 #SBATCH --mem-per-cpu=1000 #SBATCH --partition=gpu #SBATCH --gres=gpu:1 module load CUDA srun ./my_cuda_program **Settings for OpenMP and MPI jobs** ====================================== **Single node jobs** For applications that are not optimized for HPC (high performance computing) systems like simple python or R scripts and a lot of software which is optimized for desktop PCs. **Simple applications and scripts** Many simple tools and scripts are not parallelized at all and therefore won’t profit from more than one CPU core. +-----------------+----------------------------------------------------+ | **Parameter** | **Function** | +=================+====================================================+ | –nodes=1 | Start a unparallized job on only one node | +-----------------+----------------------------------------------------+ | –nt | For OpenMP, only one task is necessary | | asks-per-node=1 | | +-----------------+----------------------------------------------------+ | – | Just one CPU core will be used. | | cpus-per-task=1 | | +-----------------+----------------------------------------------------+ | –mem= | Memory (RAM) for the job. Number followed by unit | | | prefix, e.g. 16G | +-----------------+----------------------------------------------------+ If you are unsure if your application can benefit from more cores try a higher number and observe the load of your job. If it stays at approximately one there is no need to ask for more than one. **OpenMP applications** ========================== OpenMP (Open Multi-Processing) is a multiprocessing library is often used for programs on shared memory systems. Shared memory describes systems which share the memory between all processing units (CPU cores), so that each process can access all data on that system. +-----------------------+----------------------------------------------+ | **Parameter** | **Function** | +=======================+==============================================+ | –nodes=1 | Start a parallel job for a shared memory | | | system on only one node | +-----------------------+----------------------------------------------+ | –ntasks-per-node=1 | For OpenMP, only one task is necessary | +-----------------------+----------------------------------------------+ | –cpus-p | Number of threads (CPU cores) to use | | er-task= | | +-----------------------+----------------------------------------------+ | –mem= | Memory (RAM) for the job. Number followed by | | | unit prefix, e.g. 16G | +-----------------------+----------------------------------------------+ **Multiple node jobs (MPI)** ============================== Depending on the frequency and bandwidth demand of your setup, you can either just start a number of MPI tasks or request whole nodes. While using whole nodes guarantees that a low latency and high bandwidth it usually results in a longer queuing time compared to cluster wide job. With the latter the SLURM manager can distribute your task across all nodes of stallo and utilize otherwise unused cores on nodes which for example run a 16 core job on a 20 core node. This usually results in shorter queuing times but slower inter-process connection speeds. We strongly advice all users to ask for a given set of cores when submitting multi-core jobs. To make sure that you utilize full nodes, you should ask for sets that adds up to both 16 and 20 (80, 160 etc) due to the hardware specifics of Stallo i.e. submit the job with --ntasks=80 **if** your application scales to this number of tasks. This will make the best use of the resources and give the most predictable execution times. If your job requires more than the default available memory per core (32 GB/node gives 2 GB/core for 16 core nodes and 1.6GB/core for 20 core nodes) you should adjust this need with the following command: #SBATCH --mem-per-cpu=4GB When doing this, the batch system will automatically allocate 8 cores or less per node. **To use whole nodes** +----------------+-----------------------------------------------------+ | **Parameter** | **Function** | +================+=====================================================+ | –nod | Start a parallel job for a distributed memory | | es= | system on several nodes | +----------------+-----------------------------------------------------+ | –ntasks-per-no | Number of (MPI) processes per node. Maximum number | | de= | depends nodes (16 or 20 on Stallo) | +----------------+-----------------------------------------------------+ | –c | Use one CPU core per task. | | pus-per-task=1 | | +----------------+-----------------------------------------------------+ | –exclusive | Job will not share nodes with other running jobs. | | | You don’t need to specify memory as you will get | | | all available on the node. | +----------------+-----------------------------------------------------+ **To distribute your job** +-----------------+----------------------------------------------------+ | **Parameter** | **Function** | +=================+====================================================+ | –nta | Number of (MPI) processes in total. Equals to the | | sks= | number of cores | +-----------------+----------------------------------------------------+ | –m | Memory (RAM) per requested CPU core. Number | | em-per-cpu= | followed by unit prefix, e.g. 2G | +-----------------+----------------------------------------------------+ **Scalability** ========================= You should run a few tests to see what is the best fit between minimizing runtime and maximizing your allocated cpu-quota. That is you should not ask for more cpus for a job than you really can utilize efficiently. Try to run your job on 1, 2, 4, 8, 16, etc., cores to see when the runtime for your job starts tailing off. When you start to see less than 30% improvement in runtime when doubling the cpu-counts you should probably not go any further. Recommendations to a few of the most used applications can be found in `Application guides `__.